Unsupervised prosody labeling for constructing Mandarin TTS
نویسندگان
چکیده
This paper introduces an unsupervised prosody labeling method for preparing a large speech corpus used in developing a Mandarin Text-to-Speech system. Adopting a four-layer prosody hierarchy, the proposed method performs an unsupervised segmental clustering that iteratively segments spoken utterances into strings of prosodic constituents and models the patterns of the segmented prosodic constituents using both prosodic and linguistic features. The experimental results showed that the proposed unsupervised prosody labeling method could effectively label important prosodic cues so as to improve prosody prediction in a HMM-based text-to-speech system. Therefore, the proposed unsupervised prosody labeling method is promising and could be widely applied for labeling other large speech corpora.
منابع مشابه
Advanced unsupervised joint prosody labeling and modeling for Mandarin speech and its application to prosody generation for TTS
Motivated by the success of the unsupervised joint prosody labeling and modeling (UJPLM) method for Mandarin speech on modeling of syllable pitch contour in our previous study, in this paper, the advanced UJPLM (A-UJPLM) method is proposed based on UJPLM to jointly label prosodic tags and model syllable pitch contour, duration and energy level. Experimental results on the Sinica Treebank corpus...
متن کاملUnsupervised joint prosody labeling and modeling for Mandarin speech.
An unsupervised joint prosody labeling and modeling method for Mandarin speech is proposed, a new scheme intended to construct statistical prosodic models and to label prosodic tags consistently for Mandarin speech. Two types of prosodic tags are determined by four prosodic models designed to illustrate the hierarchy of Mandarin prosody: the break of a syllable juncture to demarcate prosodic co...
متن کاملNovel eigenpitch-based prosody model for text-to-speech synthesis
Prosody is an inherent supra-segmental feature in speech that human speakers employ to express, for example, attitude, emotion, intent and attention. In textto-speech (TTS) systems, high naturalness can only be achieved if the prosody of the output is appropriate. The importance of prosody is even more crucial for tonal languages, such as Mandarin Chinese, in which the tone of each syllable is ...
متن کاملThe Toshiba Mandarin TTS System for the Blizzard Challenge 2009
This paper introduces the Toshiba Mandarin Text-to-Speech (TTS) system submitted to the Mandarin benchmark of the Blizzard Challenge 2009. The basic framework keeps unchanged with the system in 2008 and we modify the system in several aspects: automatically find bad units in the database when preparing the speech corpus, add a G2P procedure after the text analysis to increase accuracy of the pr...
متن کاملPerceptually based automatic prosody labeling and prosodically enriched unit selection improve concatenative text-to-speech synthesis
Prosody is an important factor in the quality of text-tospeech (TTS) synthesis. Typically, acoustic parameters such as f0 and duration are the only variables related to prosody that are used to determine unit selection. Our study explored adding the explicit use of linguistically and perceptually motivated prosodic categories in unit selection-based TTS. One of our goals was to automate the pro...
متن کامل